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IMAGE PROCESSING APPARATUS 

This invention relates to an apparatus and method of 
operation of a processor for generating model data for a 
5 model in a three-dimensional space from image data 
representative of a set of camera images of an object. 

It is known from EP-A-0898245 to process images of the 
object taken from different , unknown positions using a 

10 matching process in which points in different images 
which correspond to the same point of the actual object 
are matched, the matching points being used to determine 
the relative positions and orientations of cameras from 
which the images were taken and to then generate model 

15 data. This process of determining the camera positions 
is referred to as calculating a camera solution and EP-A- 
0898245 discloses a camera solution process relying upon 
epipolar geometry between virtual image planes of cameras 
at camera positions from which corresponding images were 

2 0 obtained. 

Having solved the camera positions and orientations for 
an initial three cameras corresponding to an initial 
three images in a sequence of camera images using a first 
25 solution algorithm, EP-A-0898245 teaches that each new 
image of the sequence of images requires its camera 
solution to be obtained using a second camera solution 
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algorithm which assumes the camera solution for the 
preceding image in the sequence to be accurately known 
from previous calculations. Matching points between the 
new image and the preceding images in the sequence may 
5 then be processed to accumulate further model data. 



This known method of camera solution, referred to below 
as a 2-D to 2-D camera solution process , effectively 
takes as a starting point pairs of co-ordinates in 
10 virtual image planes of a pair of virtual cameras in the 
three-dimensional model space and calculates the 
parameters defining the position and orientation of each 
camera based on these pairs of two-dimensional image co- 
ordinates for matching points. 

15 

It is an object of the present invention to provide an 
apparatus and method for model generation in which the 
camera solution process relating to the addition of each 
2 0 new image is improved. 

According to the present invention there is disclosed an 
apparatus and method for generating model data without 
relying solely upon the 2-D to 2-D camera solution 
25 process. Once an initial sequence of images is processed 
and initial model data generated, camera solutions for 
subsequent images are calculated by a different process 
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which utilises the model data. 



Preferred embodiments of the present invention will now 
be described by way of example only and with reference to 
5 the accompanying drawings of which; 

Figure 1 schematically shows the components of a modular 
system in which the present invention may be embodied; 

10 Figure 2A is a schematic illustration of apparatus in 
accordance with the present invention; 

Figure 2B is a schematic diagram showing the functional 
components of the apparatus of Figure 2A; 

15 

Figure 3A is a schematic diagram showing actual camera 
positions relative to the object; 

Figure 3B is a schematic diagram showing virtual camera 
20 positions relative to the model; 

Figure 4 is a diagram illustrating a display screen in 
which camera images are displayed for matching; 

2 5 Figure 5 is a schematic diagram illustrating the mapping 
of model points into a virtual image plane of a camera; 
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Figure 6A and 6B is a schematic flowchart illustrating 
the overall process for generating model data and 
calculating camera solutions; 

5 Figure 7 is a flowchart illustrating the matching process 
enabling a provisional camera solution for a new image to 
be performed; 

Figure 8 is a flowchart illustrating operation of a 3D to 
10 2D solving process; 

Figure 9 is a schematic diagram of triangles of selected 
points used in calculating candidate camera solutions in 
the process of Figure 8; and 

15 

Figure 10 is a schematic diagram of software modules. 

Figure 1 schematically shows the components of a modular 
system in which the present invention may be embodied. 

20 

These components can be effected as processor-implemented 
instructions , hardware or a combination thereof . 

Referring to Figure 1, the components are arranged to 
25 process data defining images (still or moving) of one or 
more objects in order to generate data defining a three- 
dimensional computer model of the object(s). 
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The input image data may be received in a variety of 
ways, such as directly from one or more digital cameras, 
via a storage device such as a disk or CD ROM, by 
digitisation of photographs using a scanner, or by 
5 downloading image data from a database, for example via 
a datalink such as the Internet, etc. 

The generated 3D model data may be used to: display an 
image of the object(s) from a desired viewing position; 

10 control manufacturing equipment to manufacture a model of 
the object(s), for example by controlling cutting 
apparatus to cut material to the appropriate dimensions; 
perform processing to recognise the object(s), for 
example by comparing it to data stored in a database; 

15 carry out processing to measure the object (s), for 
example by taking absolute measurements to record the 
size of the object(s), or by comparing the model with 
models of the object (s) previously generated to determine 
changes therebetween; carry out processing so as to 

20 control a robot to navigate around the object(s); store 
information in a geographic information system (GlS)or 
other topographic database; or transmit the object data 
representing the model to a remote processing device for 
any such processing, either on a storage device or as a 

25 signal (for example, the data may be transmitted in 
virtual reality modelling language (VRML) format over the 
Internet, enabling it to be processed by a WWW browser); 
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etc . 

The feature detection and matching module 2 is arranged 
to receive image data recorded by a still camera from 
5 different positions relative to the object(s) (the 
different positions being achieved by moving the camera 
and/or the object(s)). The received data is then 
processed in order to match features within the different 
images (that is, to identify points in the images which 
10 correspond to the same physical point on the object(s)). 

The feature detection and tracking module 4 is arranged 
to receive image data recorded by a video camera as the 
relative positions of the camera and object(s) are 

15 changed (by moving the video camera and/or the 
object(s)). As in the feature detection and matching 
module 2, the feature detection and tracking module 4 
detects features, such as corners, in the images. 
However, the feature detection and tracking module 4 then 

2 0 tracks the detected features between frames of image data 
in order to determine the positions of the features in 
other images. 

The camera position calculation module 6 is arranged to 
25 use the features matched across images by the feature 
detection and matching module 2 or the feature detection 
and tracking module 4 to calculate the transformation 
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between the camera positions at which the images were 
recorded and hence determine the orientation and position 
of the camera focal plane when each image was recorded. 

The feature detection and matching module 2 and the 
camera position calculation module 6 may be arranged to 
perform processing in an iterative manner. That is, 
using camera positions and orientations calculated by the 
camera position calculation module 6, the feature 
detection and matching module 2 may detect and match 
further features in the images using epipolar geometry in 
a conventional manner, and the further matched features 
may then be used by the camera position calculation 
module 6 to recalculate the camera positions and 
orientations . 

If the positions at which the images were recorded are 
already known, then, as indicated by arrow 8 in Figure 1, 
the image data need not be processed by the feature 
detection and matching module 2, the feature detection 
and tracking module 4, or the camera position calculation 
module 6. For example, the images may be recorded by 
mounting a number of cameras on a calibrated rig arranged 
to hold the cameras in known positions relative to the 
ob ject ( s ) . 



Alternatively, it is possible to determine the positions 
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of a plurality of cameras relative to the object(s) by 
adding calibration markers to the object (s) and 
calculating the positions of the cameras from the 
positions of the calibration markers in images recorded 
by the cameras. The calibration markers may comprise 
patterns of light projected onto the object(s). Camera 
calibration module 10 is therefore provided to receive 
image data from a plurality of cameras at fixed positions 
showing the object(s) together with calibration markers, 
and to process the data to determine the positions of the 
cameras. A preferred method of calculating the positions 
of the cameras (and also internal parameters of each 
camera, such as the focal length etc) is described in 
"Calibrating and 3D Modelling with a Multi-Camera System" 
by Wiles and Davison in 19 99 IEEE Workshop on Multi-View 
Modelling and Analysis of Visual Scenes, ISBN 0769501109. 

The 3D object surface generation module 12 is arranged to 
receive image data showing the object (s) and data 
defining the positions at which the images were recorded, 
and to process the data to generate a 3D computer model 
representing the actual surface (s) of the object(s), such 
as a polygon mesh model. 

The texture data generation module 14 is arranged to 
generate texture data for rendering onto the surface 
model produced by the 3D object surface generation 
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module 12. The texture data is generated from the input 
image data showing the object (s). 

Techniques that can be used to perform the processing in 
5 the modules shown in Figure 1 are described in EP-A- 
0898245, EP-A-0901105, pending US applications 09/129077, 
09/129079 and 09/129080, the full contents of which are 
incorporated herein by cross-reference, and also Annex A. 

0 The present invention may be embodied in particular as 
part of the camera position calculation module 6. 

Figures 2A and 2B illustrate apparatus for use in 
carrying out the present invention, the apparatus being 

5 in the form of a desk top computer having a processor 24 
with associated random access memory 35 and mass storage 
memory 36. Figure 2A illustrates a display monitor 20 
which is controlled by the processor 24 and comprises a 
display screen 21 for the display of images and for use 

0 in interactively controlling the processor in generating 
the model as described below. The random access memory 
35 includes a concordance table 38 described below. 

A computer mouse 2 6 used in conjunction with a displayed 
5 cursor provides pointing signals 25 in a conventional 
manner and a keyboard 2 7 is also provided for the input 
of user data. 
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Software for operating the processor 24 may be input to 
the processor 24 from a portable storage medium in the 
form of a floppy disc 2 8 via a disc drive 29. 

A modem 22 is also connected to the processor 24 for the 
input of signals 23 carrying program code or data 
transmitted over a network such as the internet. 

Images I n (n = 1 to N) in the form of files of image 
data are input to the processor 24 by connecting a 
digital camera 3 0 to an input port 3 7 of the processor 
24. 

Figure 3A illustrates the actual positions 30 n of the 
camera 30 at which successive images in an ordered 
sequence (n = 1 to N) are taken of an object 31. The 
sequence is ordered such that, when viewed in plan view 
from above, the successive positions of the camera 30 
move in a progressively anticlockwise direction relative 
to the object 31. 

Figure 3B shows the model 110 in the three-dimensional 
space of the model and virtual cameras L n ( n =1 to N) , 
each virtual camera L n being represented by a respective 
centre of projection C n and a virtual image plane 32 
spaced from the centre of projection by the focal length 
of the camera. 
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The actual positions 30n of the camera 30 in Figure 3A 
will not in general be known and are therefore calculated 
by the camera position calculation module 6 from an 
analysis of the images themselves. An initial camera 
5 solution, i.e. calculation of the position and 
orientation of the virtual cameras L n relative to the 
model 110 in the co-ordinate system of the model as shown 
in Figure 3B, is performed for the initial three camera 
images X lf l 2 , l 3 to obtain solutions for virtual cameras 

10 L lf L 2 and L 3 . To perform the calculation, it is 
necessary to identify matching points in images X x and I 2 
and to identify corresponding pairs of matching points in 
images I 2 and I 3 , thereby establishing data in the 
concordance table 38 of matching points across three 

15 images. The camera solution is then calculated using a 
process hereafter referred to as a 2-D to 2-D process 
which utilises epipolar geometry, i.e. based on the 
positions of the matched points in the two-dimensional 
images when mapped onto the virtual image planes 32 in 

2 0 order to deduce the camera transformation. 

A set of model coordinates representative of model points 
correspond to image points for the matched two 
dimensional coordinates is then calculated on the basis 
2 5 of the camera solution and entered in the concordance 
table 38. 
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Once an initial camera solution from the first triplet of 
images l lf I 2 , i 3 has been calculated, a different solving 
process is adopted for subsequent virtual cameras L n 
(n>3) derived from subsequent images I n in the sequence. 
5 This process utilises the information in the concordance 
table 3 8 to identify new matching points found in each 
new image with coordinates of the existing model data. 
The camera solution for the new camera is then calculated 
based on a set of three dimensional model coordinates and 
10 corresponding two dimensional image coordinates in the 
new image. This process is referred to below as a 3-D to 
2-D process. 

In the solving process, the assumption is made of the 
15 camera being representable by a pinhole camera model and 
that the internal camera parameters of the camera are 
known . 

The overall process of building the model data and 
2 0 performing the camera solutions for a set of images will 
now be described with _ reference to the flowchart > of 
Figures 6A and 6B. At step 60, the user selects the 2-D 
to 2-D camera solution process by selecting the 
appropriate mode selecting icon 48 as illustrated in 
25 Figure 4 and performs matching between the first triplet 
of images, I u I 2 and I 3 . This matching process involves 
the display of pairs of images for inspection by the user 
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who then selects matching pairs by using the mouse 26 and 
cursor 42 to select matching features in each of the pair 
of images. When the user has finished matching , the user 
terminates the matching step by the input of a 
predetermined control command. 

At step 61, the processor 24 calculates the camera 
solution for the triplet of initial virtual cameras L X/ 
L 2 and L 3 , using the 2-D to 2-D solving process, thereby 
calculating the position of the respective image plane 
and look direction for each of the three virtual cameras 
in the three dimensional space of the model. 

At step 62, the processor 24 calculates model data in 
three dimensions from the measured co-ordinates of 
matching features established for the initial triplet of 
images and stores the results with the matching feature 
data in the concordance table 38. The concordance table 
then contains an accumulation of data in which the two 
dimensional coordinates of matching image points are 
related to the three dimensional co-ordinates of model 
points. 

At step 63, the processor 24 displays a new image I n (in 
this case n = 4) for matching with the preceding image 
I n _x and prompts the user to perform matching at step 64 
between the new image I n and the preceding image I n . 1# 
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This matching process is illustrated in Figure 4 which 
illustrates the display screen 21 where images I n and 1^ 
are displayed for comparison in respective image windows 
40 and 41. 

Figure 4 also illustrates a row of mode selecting icons 
4 8 which, as mentioned above, may be selected using the 
cursor 42 and mouse 2 6 in order to select the various 
modes of operation adopted by the processor 24 in the 
modelling and camera solving processes. 

At step 64 , the user enters co-ordinates of pairs of 
matching image points and the processor 24 performs 
matching between the new image I n and previous image I n _! 
in a manner which is shown in greater detail in the 
flowchart of Figure 7. At step 70 of Figure 7, the 
processor 24 controls the display 20 to display the 
images I n and 1^ including indicators 43 in the image 
I n -i which identify previously matched image points for 
which existing model data is stored in the concordance 
table. The user enters co-ordinates of matching image 
points by using the mouse 2 6 to move the cursor 42 
between the displayed images and select matching 
features. In some cases, the resulting selection signals 
25 received by the processor 24 at step 71 will be 
determined at step 72 to define a matching pair of points 
which include a point in I n _! coincident with one of the 
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indicators 43, such matching points being entered at step 
73 into an initial set of two dimensional coordinate data 
to be used in the 3-D to 2-D solving process. The 
matching data obtained in the matching step 71 is entered 
at step 74 into the concordance table 38 for use in 
generating further model data. 

The remaining matched points which at step 72 are 
determined to relate to features in I n _! not previously 
matched are also added at step 74 as new entries in the 
concordance table of matched image features to be 
available for subsequent use in generating further model 
data. 

When at step 75 the matching process is determined to 
have been terminated by the user inputting a 
predetermined control command, the processor 24 then 
begins to process the initial set of two dimensional 
coordinate data. Referring to Figure 6A, the processor 
24 at step 65 begins by identifying the three dimensional 
model coordinates corresponding to each of the two 
dimensional image coordinates for the new image I n in the 
initial set by referring to the concordance table 38 of 
matched image features and model data. 

The camera solution for the virtual camera L n is then 
calculated at step 66 using the 3-D to 2-D solving 
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process, the result being regarded as a provisional 
result since it is based on the initial set of data which 
is limited in size by the number of indicators displayed 
in the previous image 1^. In order to make full use of 
all of the existing three dimensional model data, the 
processor 24 at step 67 maps the three dimensional model 
points represented by the remainder of the set of model 
data into the two dimensional virtual image plane of the 
virtual camera L n , thereby obtaining a set of two 
dimensional reference points in the image plane 52. 

Figure 5 illustrates this mapping process schematically 
where a small set of three dimensional model coordinates 
5 0 are illustrated as being mapped into a corresponding 
set of two-dimensional reference points 51 in the image 
plane 52 of camera L n . 

At step 68, the processor 24 performs automatic matching 
of features in the new image I n with the reference points 
51 obtained in step 67 using a constrained matching 
technique in which the search for a matching feature to 
each of the reference points is confined to a localised 
area proximate to the reference point in the new image. 

After completing the constrained matching process, the 
processor 24 at step 69 is then able to identify an 
enlarged set of two dimensional image coordinates in the 
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new image I n for which correspondence is matched with 
three dimensional model coordinates, including the 
results of both step 68 and step 65. 

A revised result for the camera solution for the virtual 
camera L n is then calculated by again using the 3-D to 2- 
D solving process but based on the enlarged set of 2-D 
matched coordinates and corresponding 3-D model data at 
step 610. 

If at step 611 the processor 24 determines that there are 
more images to be processed, the process repeats from 
step 6 3 for a new image I n for which the value of n is 
incremented by 1 . 

When all of the images have been processed, additional 
model data is calculated at step 612 of Figure 6B using 
all of the matched image feature data accumulated during 
each performance of the matching process of step 6 4 and 
the automatic matching process of step 6 8 for all of the 
images, provided that matching of a feature in at least 
three images is required before a new model data point 
can be determined. 

Using the expanded model data set established in step 
612, the processor 24 at step 613 applies the 3-D to 2-D 
solving process to each of the virtual cameras L n in 
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order to refine the camera solutions for use in any 
subsequent processing . 

The 3D to 2D solving process used in steps 66 and 610 
5 will now be described with reference to Figure 8. For 
this example , the use of the 3D to 2D process of step 66 
is described for camera L n where n is greater than 3. As 
shown in Figure 9, the solution for camera L n requires a 
set of coordinates for matching points in each of cameras 
10 L n , L^.! and L n _ 2 where cameras and L n _ 2 already have 

known position and orientation as a result of earlier 
solving processes. 

Each pair of matching points in L n _! and L n _ 2 has a 
15 corresponding three-dimensional model point in the 
existing model data, the association between these sets 
of data being defined in the concordance table 38. 

For each pair of matching image points represented in the 
2 0 image data for L n _! and L n _ 2 there is a matching image 
point represented in the image data for camera L n as a 
result of the matching process performed in step 64 
referred to above. 

25 Reference will be made to the method steps of Figure 8 .as 
well as the diagram of Figure 9 in the following 
description. The processor in implementing the steps of 
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Figure 8 uses a RAN SAC (random sampling and consensus) 
algorithm. At step 80 , the processor 24 selects at 
random three matches between images I n , I n-1 and I n _ 2 , such 
that each match comprises sets of two-dimensional image 
5 coordinates expressed in pixel numbers. These three 
matches have coordinates which define the apices of 
respective imaginary triangles 90, 91 and 92 as shown in 
Figure 9 . The corresponding three-dimensional co- 
ordinates in the model data define model points at apices 

10 of a . further imaginary triangle 93 whose positions are 
known in "world coordinates" or in other words relative 
to the frame of reference with which the model data is 
defined. The triangle 92 of image points in the new 
image I n may therefore be regarded as a two-dimensional 

15 projection of the triangle 93 of model points onto the 
virtual image plane 52 of the camera L n so that the 
position and orientation of the image plane 52 can be 
calculated using a standard geometrical transformation 
represented in Figure 9 by arrow T. 

20 

The result of this calculation will be a set of values 
defining the position in world coordinates and the 
orientation relative to the model frame of reference of 
the image plane 52 and constitutes a first candidate 
25 solution for the required camera solution for L n . 

As shown in Figure 8, step 81 of calculating this first 
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candidate solution is followed by step 82 of using the 
first candidate solution to map all of the model points 
corresponding to the initial set of image points into the 
image plane I n . If the first candidate solution were in 
5 fact a perfect solution, the mapped points would be 
expected to substantially coincide with the user entered 
matched image points. In practice^ however, the mapped 
points will be displaced relative to the matched image 
points by a number of pixels which provides a measure of 
10 the degree of correlation between the mapped points and 
matched image points. 

At step 83, a correlation calculation is performed 
between the mapped points and the matched image points by 
15 counting the number of mapped points which fall within a 
predetermined number of pixels radius of the matched 
image points. In this example, the predetermined number 
of pixels is three. 

20 The number of matching pairs of mapped points and matched 
image points in the image is equal to the number of 
inliers for this candidate solution, each inlier 
comprising data defining co-ordinates of a model point 
together with co-ordinates of corresponding image points 

2 5 in each of at least three images. 

The above calculation is repeated for a number of further 
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candidate solutions and at step 84 the processor 24 
determines whether the current candidate solution 
produces the best result so far in terms of a number of 
inliers. If so, the candidate solution and number of 
inliers is stored in step 85 as the result of the 
process . 

At step 86, it is determined whether the required number 
of candidate solutions has yet been processed, and if 
not, the process repeats from step 80 where a new set of 
three matches are selected at random and the above 
described steps repeated. 

When the required number of candidate solutions has been 
processed, the processor outputs at step 87 the stored 
result in terms of the candidate solution and number of 
inliers stored in step 85 for the optimum candidate 
solution. Also output are the inliers for the candidate 
solution in terms of the set of point matches verified by 
the solving process to represent consistent matched data 
across the three images I n , I I1 _ 1 and I n _ 2 . 

The calculation referred to above at step 81 makes use of 
the well-known projection geometry described for example 
in "Computer and Robot Vision, Volume 2" by Robert M 
Haralick and Linda G Shapiro, 1993, Addison Wesley, pages 
85 to 91. This publication describes in this passage a 
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transformation which may readily be inverted to suit the 
calculation required for the present context and defining 
thereby the transformation T referred to above. 

5 Figure 10 shows schematically some of the software 
modules utilised in the above process. An image data 
file 100 contains image data input from a camera or the 
like and a model data file 101 contains the model data 
generated from the image data. 

10 

Concordance table 3 8 referred to above includes related 
entries identifying the correspondence between matched 
image data in two or more images and the corresponding 
model data co-ordinates. 

15 

An inliers file 102 contains information defining the 
inliers found in each of the best candidate camera 
solutions and represents a set of point matches which are 
correct and verified to be consistent across three or 
2 0 more image s . 

The data files 100, 101, 38 and 102 are typically held in 
random access memory 35 during processing and ultimately 
stored in mass storage memory 36 of Figure 2. 



Also shown in Figure 10 are the processing elements 
including the 2-D to 2-D solving process 103 and the 3-D 
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to 2-D solving process 104 which includes both the RANSAC 
algorithm 105 and the candidate camera solution 1 
algorithm 106. 

The RANSAC algorithm 105 and candidate camera solution 
algorithm 106 constitute computer programs comprising 
processor implementable instructions which may be stored 
in a storage medium such as floppy disc 2 8 or may be 
downloaded as signals 2 3 from a network such as the 
internet. Such signals and storage mediums embodying 
these instructions therefore constitute aspects of the 
present invention. Similarly , other programs for 
carrying out the above described embodiments including 
control software for controlling operation of the above 
software modules may be stored in the storage medium or 
transmitted as a signal, thereby constituting further 
aspects of the present invention. 



20 
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ANNEX A 



1 . CORNER DETECTION 



1 . 1 Summary 



This process described below calculates corner points, to 
sub-pixel accuracy, from a single grey scale or colour 
image* It does this by first detecting edge boundaries in 
10 the image and then choosing corner points to be points 
where a strong edge changes direction rapidly. The 
method is based on the facet model of corner detection, 
described in Haralick and Shapiro 1 . 

15 1.2 Algorithm 

The algorithm has four stages: 

(1) Create grey scale image (if necessary); 

2 0 (2) Calculate edge strengths and directions; 

(3) Calculate edge boundaries; 

(4) Calculate corner points. 



1.2.1 Create grey scale image 



25 



The corner detection method works on grey scale images. 
For colour images, the colour values are first converted 
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to floating point grey scale values using the formula: 

grey_scale = (0.3 * red) +(0.59 x green) +(0.11 x blue) 

. • . -A-l 

This is the standard definition of brightness as defined 
by NTSC and described in Foley and van Dam 11 . 

1-2.2 Calcul ate edge strengths and directions 

The edge strengths and directions are calculated using 
the 7x7 integrated directional derivative gradient 
operator discussed in section 8.9 of Haralick and 
Shapiro 1 . 

The row and column forms of the derivative operator are 
both applied to each pixel in the grey scale image. The 
results are combined in the standard way to calculate the 
edge strength and edge direction at each pixel. 

The output of this part of the algorithm is a complete 
derivative image . 
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1.2.3 Calculate edge boundaries 

The edge boundaries are calculated by using a zero 
crossing edge detection method based on a set of 5x5 
5 kernels describing a bivariate cubic fit to the 
neighbourhood of each pixel. 

The edge boundary detection method places an edge at all 
pixels which are close to a negatively sloped zero 
10 crossing of the second directional derivative taken in 
the direction of the gradient, where the derivatives are 
defined using the bivariate cubic fit to the grey level 
surface. The subpixel location of the zero crossing is 
also stored along with the pixel location. 

15 

The method of edge boundary detection is described in 
more detail in. section 8.8.4 of Haralick and Shapiro 1 . 

1.2.4 Calculate corner points 

20 

The corner points are calculated using a method which 
uses the edge boundaries calculated in the previous 
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step . 

Corners are associated with two conditions: 

5 (1) the occurrence of an edge boundary; and 

(2) significant changes in edge direction. 

Each of the pixels on the edge boundary is tested for 
10 "cornerness" by considering two points equidistant to it 
along the tangent direction. If the change in the edge 
direction is greater than a given threshold then the 
point is labelled as a corner. This step is described in 
section 8.10.1 of Haralick and Shapiro 1 . 

15 

Finally the corners are sorted on the product of the edge 
strength magnitude and the change of edge direction. The 
top 2 00 corners which are separated by at least 5 pixels 
are output . 

20 



2 . FEATURE TRACKING 
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2 . 1 Summary 

This process described below tracks feature points 
(typically corners) across a sequence of grey scale or 
5 colour images . 

The tracking method uses a constant image velocity Kalman 
filter to predict the motion of the corners, and a 
correlation based matcher to make the measurements of 
10 corner correspondences. 

The method assumes that the motion of corners is smooth 
enough across the sequence of input images that a 
constant velocity Kalman filter is useful, and that 
15 corner measurements and motion can be modelled by 
gaussians. 

2.2 Algorithm 

2 0 1) Input corners from an image. 



2) Predict forward using Kalman filter. 
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If the position uncertainty of the predicted corner 
is greater than a threshold, A, as measured by the 
state positional variance, drop the corner from the 
list of currently tracked corners. 

Input a new image from the sequence. 

For each of the currently tracked corners : 

a) search a window in the new image for pixels 
which match the corner ; 

b) update the corresponding Kalman filter, using 
any new observations (i.e. matches). 

Input the corners from the new image as new points 
to be tracked (first, filtering them to remove any 
which are too close to existing tracked points). 

Go back to (2) 

1 Prediction 



uses the following standard Kalman filter equations 
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for prediction, assuming a constant velocity and random 
uniform gaussian acceleration model for the dynamics: 

X n.l = ®n+l f n X n • - • -A-2 

= e n + l,n K n®l.,n + Qn ....A-3 

5 where x is the 4D state of the system, (defined by the 

position and velocity vector of the corner), K is the 

state covariance matrix, Q is the transition matrix, and 
Q is the process covariance matrix. 

10 In this model, the transition matrix and process 
covariance matrix are constant and have the following 
values: 



fx I* 



. .A-4 



'o 0 ^ 

0 a 2 I 



. . . .A-5 



15 



31 

2-2.2 Searching and matching 
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This uses the positional uncertainty (given by the top 
two diagonal elements of the state covariance matrix, K) 
5 to define a region in which to search for new 
measurements (i.e. a range gate). 

The range gate is a rectangular region of dimensions: 



The correlation score between a window around the 
previously measured corner and each of the pixels in the 
15 range gate is calculated. 

The two top correlation scores are kept. 

If the top correlation score is larger than a threshold, 
20 c 0 , and the difference between the two top correlation 
scores is larger than a threshold AC, then the pixel with 
the top correlation score is kept as the latest 
measurement . 



10 




. . . .A-6 



5 
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2.2.3 Update 

The measurement is used to update the Kalman filter in 

the standard way: 

G = KH 7 ( HKH T +R) _1 A-7 

x^x + G(x-Hx) . . . .A-8 
K-+{I-GH)K A-9 

10 where G is the Kalman gain, H is the measurement matrix, 
and R is the measurement covariance matrix. 

In this implementation, the measurement matrix and 
measurement covariance matrix are both constant, being 
15 given by: 

H ■= (JO) ... .A-10 
R = o 2 I A-ll 

20 2.2.4 Parameters 

The parameters of the algorithm are: 



Initial conditions: x 0 and K 0 . 
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Process velocity variance: o v 2 . 
Measurement variance: a 2 . 

Position uncertainty threshold for loss of 
track: A. 

Covariance threshold: C 0 . 
Matching ambiguity threshold: AC- 



For the initial conditions, the position of the first 
corner measurement and zero velocity are used, with an 
initial covariance matrix of the form: 



( 0 
0 



. A-12 



<V is set to a 0 2 = 200 (pixels /frame) 2 . 

The algorithm's behaviour over a long sequence is anyway 
not too dependent on the initial conditions. 

The process velocity variance is set to the fixed value 
of 50 (pixels/frame) 2 . The process velocity variance 
would have to be increased above this for a hand-held 
•sequence. In fact it is straightforward to obtain a 
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reasonable value for the process velocity variance 
adaptively . 

The measurement variance is obtained from the following 
5 model: 

a 2 = (rK+a) A-13 

where K = n/(K 11 K 22 ) is a measure of the positional 
uncertainty, r is a parameter related to the likelihood 
of obtaining an outlier , and a is a parameter related to 
10 the measurement uncertainty of inliers. "r" and "a" are 
set to r=0.1 and a=1.0. 

This model takes into account, in a heuristic way, the 
fact that it is more likely that an outlier will be 
15 obtained if the range gate is large. 

The measurement variance (in fact the full measurement 
covariance matrix R) could also be obtained from the 
behaviour of the auto-correlation in the neighbourhood of 
2 0 the measurement. However this would not take into 
account the likelihood of obtaining an outlier. 
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The remaining parameters are set to the values: A=400 
pixels 2 , C 0 =0.9 and AC=0.001. 



3 • 3D SURFACE GENERATION 

5 

3 . 1 Architecture 

In the method described below, it is assumed that the 
object can be segmented from the background in a set of 
10 images completely surrounding the object- Although this 
restricts the generality of the method, this constraint 
can often be arranged in practice, particularly for small 
objects . 

15 The method consists of five processes, which are run 
consecutively: 

First, for all the images in which the camera 
positions and orientations have been calculated, 
20 the object is segmented from the background, using 

colour information. This produces a set of binary 
images, where the pixels are marked as being either 
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object or background. 

The segmentations are used, together with the 
camera positions and orientations, to generate a 
voxel carving, consisting of a 3D grid of voxels 
enclosing the object. Each of the voxels is marked 
as being either object or empty space. 

The voxel carving is turned into a 3D surface 
triangulation, using a standard triangulation 
algorithm (marching cubes) . 

The number of triangles is reduced substantially by 
passing the triangulation through a decimation 
process. 

Finally the triangulation is textured, using 
appropriate parts of the original images to provide 
the texturing on the triangles. 
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3 . 2 Segmentation 

The aim of this process is to segment an object (in front 
of a reasonably homogeneous coloured background) in an 
5 image using colour information. The resulting binary- 
image is used in voxel carving. 

Two alternative methods are used: 

10 Method 1: input a single RGB colour value 

representing the background colour - each RGB pixel 
in the image is examined and if the Euclidean 
distance to the background colour (in RGB space) is 
less than a specified threshold the pixel is 

15 labelled as background (BLACK). 

Method 2: input a "blue" image containing- a 
representative region of the background. 

20 The algorithm has two stages: 

(1) Build a hash table of quantised background colours 
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(2) Use the table to segment each image. 

Step 1) Build hash table 

5 Go through each RGB pixel, p, in the "blue" background 
image. 

Set q to be a quantised version of p. Explicitly: 

q = (p+t/2) /t A-14 

10 where t is a threshold determining how near RGB values 
need to be to background colours to be labelled as 
background. 

The quantisation step has two effects: 

15 

1) reducing the number of RGB pixel values, thus 
increasing the efficiency of hashing; 

2) defining the threshold for how close a RGB pixel 
2 0 has to be to a background colour pixel to be 

labelled as background. 
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q is now added to a hash table (if not already in the 
table) using the (integer) hashing function 

h(q) = (g_red & 7) *2"6+ (q_green & 7) *2^3+ (q_blue & 7) 
5 A-15 

That is, the 3 least significant bits of each colour 

field are used. This function is chosen to try and 

spread out the data into the available bins. Ideally 

10 each bin in the hash table has a small number of colour 

entries. Each quantised colour RGB triple is only added 

once to the table (the frequency of a value is 

irrelevant ) . 

15 Step 2) Segment each image 

Go through each RGB pixel , v, in each image. 

Set w to be the quantised version of v as before. 

20 

To decide whether w is in the hash table, explicitly look 
at all the entries in the bin with index h(w) and see if 
any of them are the same as w. If yes, then v is a 
background pixel - set the corresponding pixel in the 
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output image to BLACK. If no then v is a foreground 
pixel - set the corresponding pixel in the output image 
to WHITE 

5 Post Processing: For both methods a post process is 
performed to fill small holes and remove small isolated 
regions. 

A median filter is used with a circular window. (A 
10 circular window is chosen to avoid biasing the result in 
the x or y directions). 

Build a circular mask of radius r. Explicitly store the 
start and end values for each scan line on the circle. 

15 

Go through each pixel in the binary image. 

Place the centre of the mask on the current pixel. Count 
the number of BLACK pixels and the number of WHITE pixels 
20 in the circular region. 



If (#WHITE pixels ;> # BLACK pixels) then set corresponding 
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output pixel to WHITE. Otherwise output pixel is BLACK. 



3 . 3 Voxel carving 

5 The aim of this process is to produce a 3D voxel grid, 
enclosing the object, with each of the voxels marked as 
either object or empty space. 

The input to the algorithm is: 

10 

a set of binary segmentation images, each of which 
is associated with a camera position and 
orientation; 

15 - 2 sets of 3D co-ordinates, (xmin, ymin, zmin) and 
(xmax, ymax, zmax), describing the opposite 
vertices of a cube surrounding the object; 

a parameter, n, giving the number of voxels 
20 required in the voxel grid. 

A pre-processing step calculates a suitable size for the 
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voxels (they are cubes) and the 3D locations of the 
voxels, using n, (xmin, ymin, zmin) and (xmax, ymax, 
zmax) . 

5 Then, for each of the voxels in the grid, the mid-point 
of the voxel cube is projected into each of the 
segmentation images. If the projected point falls onto 
a pixel which is marked as background, on any of the 
images , then the corresponding voxel is marked as empty 
10 space, otherwise it is marked as belonging to the object. 

Voxel carving is described further in "Rapid Octree 
Construction from Image Sequences" by R. Szeliski in 
CVGIP: Image Understanding, Volume 58, Number 1, July 
15 1993, pages 23-32. 

3.4 Marching cubes 

The aim of the process is to produce a surface 
2 0 triangulation from a set of samples of an implicit 
function representing the surface (for instance a signed 
distance function) . In the case where the implicit 
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function has been obtained from a voxel carve, the 
implicit function takes the value -1 for samples which 
are inside the object and +1 for samples which are 
outside the object. 

Marching cubes is an algorithm that takes a set of 
samples of an implicit surface (e.g. a signed distance 
function) sampled at regular intervals on a voxel grid, 
and extracts a triangulated surface mesh. Lorensen and 
Cline 111 and Bloomenthal iv give details on the algorithm 
and its implementation. 

The marching-cubes algorithm constructs a surface mesh by 
"marching" around the cubes while following the zero 
crossings of the implicit surface f (x)=0, adding to the 
triangulation as it goes. The signed distance allows the 
marching-cubes algorithm to interpolate the location- of 
the surface with higher accuracy than the resolution of 
the volume grid. The marching cubes algorithm can be 
used as a continuation method (i.e. it finds an initial 
surface point and extends the surface from this point). 
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3 . 5 Decimation 

The aim of the process is to reduce the number of 
triangles in the model , making the model more compact and 
5 therefore easier to load and render in real time. 

The process reads in a triangular mesh and then randomly 
removes each vertex to see if the vertex contributes to 
the shape of the surface or not. (i.e. if the hole is 
10 filled, is the vertex a "long" way from the filled hole). 
Vertices which do not contribute to the shape are kept 
out of the triangulation. This results in fewer vertices 
(and hence triangles) in the final model. 

15 The algorithm is described below in pseudo-code. 

f 

INPUT 

Read in vertices 

Read in triples of vertex IDs making up triangles 
20 J 
PROCESSING 

Repeat NVERTEX times 

Choose a random vertex , V, which hasn't been 
chosen before 
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Locate set of all triangles having V as a 
vertex, S 

Order S so adjacent triangles are next to each 
other 

5 Re-triangulate triangle set f ignoring V (i.e. 

remove selected triangles & V and then fill in 
hole) 

Find the maximum distance between V and the 
plane of each triangle 
10 if (distance < threshold) 

Discard V and keep new triangulation 

Else 

Keep V and return to old triangulation 

15 OUTPUT 

Output list of kept vertices 
Output updated list of triangles 



The process therefore combines adjacent triangles in the 
model produced by the marching cubes algorithm, if this 
can be done without introducing large errors into the 
model . 



20 



25 



The selection of the vertices is carried out in a random 
order in order to avoid the effect of gradually eroding 
a large part of the surface by consecutively removing 




46 2635801 

neighbouring vertices. 

3 • 6 Further Surface Generation Techniques 

5 Further techniques which may be employed to generate a 3D 
computer model of an object surface include voxel 
colouring, for example as described in "Photorealistic 
Scene Reconstruction by Voxel Coloring" by Seitz and Dyer 
in Proc. Conf . Computer Vision and Pattern Recognition 

10 1997, pl067-1073, "Plenoptic Image Editing" by Seitz and 
Kutulakos in Proc. 6th International Conference on 
Computer Vision, pp 17-24, "What Do N Photographs Tell Us 
About 3D Shape?" by Kutulakos and Seitz in University of 
Rochester Computer Sciences Technical Report 680, January 

15 1998, and "A Theory of Shape by Space Carving" by 
Kutulakos and Seitz in University of Rochester Computer 
Sciences Technical Report 69.2, May 1998. 

4 . TEXTURING 

20 

The aim of the process is to texture each surface polygon 
(typically a triangle) with the most appropriate image 
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texture. The output of the process is a VRML model of 
the surface, complete with texture co-ordinates. 



The triangle having the largest projected area is a good 
5 triangle to use for texturing, as it is the triangle for 
which the texture will appear at highest resolution. 



A good approximation to the triangle with the largest 
projected area, under the assumption that there is no 
10 substantial difference in scale between the different 
images, can be obtained in the following way. 

For each surface triangle, the image "i" is found such 
that the triangle is the most front facing (i.e. having 
15 the greatest value for n t .v if where n t is the triangle 
normal and v ± is the viewing direction for the "i" th 
camera). The vertices of the projected triangle are then 
used as texture co-ordinates in the resulting VRML model. 



20 



This technique can fail where there is a substantial 
amount of self-occlusion, or several objects occluding 
each other. This is because the technique does not take 
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into account the fact that the object may occlude the 
selected triangle. However, in practice this does not 
appear to be much of a problem. 



5 It has been found that, if every image is used for 
texturing then this can result in very large VRML models 
being produced. These can be cumbersome to load and 
render in real time. Therefore, in practice, a subset of 
images is used to texture the model. This subset may be 
10 specified in a configuration file. 
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CLAIMS : 

1. A method of creating a 3-D model of an object by 
processing images taken from a series of respective 
camera positions relative to the object; 
the method comprising; 

processing an initial sequence of the images to 
define respective image co-ordinates of matching features 
to generate therefrom a set of model data defining model 
points in a 3-D space of the model and to obtain 
respective camera solutions representative of positions 
and orientations of virtual cameras in the 3-D space 
defining views of the model corresponding to the images; 
and 

adding a new image to the sequence and processing 
the new image to obtain a camera solution for a 
corresponding new virtual camera for use in generating 
further model data; 

wherein the processing of the new image comprises; 

(a) identifying a plurality of image points in the 
new image which are matched to a respective plurality of 
image points of at least one preceding image of the 
sequence for which respective 3-D model data defining 
corresponding model points exists; 

(b) determining a set of 2-D image co-ordinates of 
the identified image points in the new image and co- 
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ordinates of respective model points; and 

(c) processing the set of 2-D image point co- 
ordinates and respective 3-D model point co-ordinates to 
obtain the camera solution for the new image using a 
solving process in which the position and orientation of 
an image plane representative of the new virtual camera 
are calculated from a geometrical relationship in the 3-D 
model space between model points and image points defined 
by the set of co-ordinates. 

2. A method as claimed in claim 1 wherein the solving 
process of step (c) comprises: 

selecting a subset of image points and corresponding 
model points defined by the set of co-ordinates; 

calculating a candidate camera solution from the 
geometrical relationship between the points defined by 
the subset; 

repeating the selecting and calculating step for 
different subsets to obtain successive candidate camera 
solutions ; 

evaluating the candidate camera solutions; and 
selecting a best candidate camera solution on the 
basis of the evaluating step. 



3. A method as claimed in claim 2 wherein each subset 



C 



comprises a selection of three model points and 
respective image points, the three model points defining 
apices of a first triangle and the three image points 
defining apices of a second triangle, and whereby the 
5 geometrical relationship is defined by the second 

triangle constituting a mapping of the first triangle 
onto the image plane . 

4. A method as claimed in claim 3 wherein the mapping 
10 comprises a perspective mapping, 

5. A method as claimed in any of claims 2 to 4 wherein 
the evaluating step comprises mapping model points 
defined by the set of existing model data into the image 

15 plane using the candidate solution to obtain coordinates 

of reference points in the image plane; and 

correlating the reference points with the image 
points, 

20 6 . A method as claimed in claim 5 wherein the 

correlating step comprises determining whether each image 
point lies within a predetermined number of pixel units 
from a respective reference point and counting the number 
of such image points as a measure of correlation. 



25 
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7. A method as claimed in claim 6, wherein the step of 
selecting the best candidate camera solution comprises 
selection according to the highest measure of 
correlation - 

8. A method as claimed in any of claims 2 to 7 
comprising determining a set of inliers for the best 
candidate solution wherein each inlier comprises data 
defining co-ordinates of a model point together with co- 
ordinates of corresponding image points in each of at 
least three images of the sequence. 

9. A method as claimed in any of claims 2 to 8, 
comprising the step of using the best camera solution to 
project the remainder of the set of existing model data 
into the image plane of the new virtual camera to obtain 
a set of further reference points in the image plane; 

performing matching using the further reference 
points to identify matching further image points of the 
new image ; and 

adding the co-ordinates of the further image points 
and respective model points to the set of co-ordinates 
determined at step (b) to thereby obtain an enlarged set 
of co-ordinates. 
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10. A method as claimed in claim 9 comprising processing 
the enlarged set of co-ordinates using the solving 
process of step (c) to obtain a revised result for the 
best camera solution. 

11. A method as claimed in claim 10 further comprising 
generating further model data in accordance with the 
revised camera solution using image co-ordinates of 
matching features in the new image and preceding images 
of the sequence, and adding the further model data to the 
set of model data to form an expanded set of model data. 

12. A method as claimed in claim 11 further comprising 
repeating the calculation of camera solutions for the 
sequence of images using the expanded set of model data. 

13. Apparatus for creating a 3-D model of an object by 
processing images taken from a series of respective 
camera positions relative to the object; 

the apparatus comprising; 

means for processing an initial sequence of the 
images to define respective image co-ordinates of 
matching features to generate therefrom a set of model 
data defining model points in a 3-D space of the model 
and to obtain respective camera solutions representative 
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of positions and orientations of virtual cameras in the 
3-D space defining views of the model corresponding to 
the images ; and 

means for adding a new image to the sequence and 
processing the new image to obtain a camera solution for 
a corresponding new virtual camera for use in generating 
further model data; 

wherein the means for processing of the new image 
comprises ; 

(a) means for identifying a plurality of image 
points in the new image which are matched to a respective 
plurality of image points of at least one preceding 
image of the sequence for which respective 3-D model data 
defining corresponding model points exists ; 

(b) means for determining a set of 2-D image co- 
ordinates of the identified image points in the new image 
and co-ordinates of respective model points; and 

(c) solving means for processing the set of 2-D 
image point co-ordinates and respective 3-D model point 
co-ordinates to obtain the camera solution for the new 
image using a solving process in which the position and 
orientation of an image plane representative of the new 
virtual camera are calculated from a geometrical 
relationship in the 3-D model space between model points 
and image points defined by the set of co-ordinates. 
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14. Apparatus as claimed in claim 13 wherein the solving 
means comprises: 

means for selecting a subset of image points and 
corresponding model points defined by the set of co- 
5 ordinates; 

means for calculating a candidate camera solution 
from the geometrical relationship between the points 
defined by the subset; 

means for repeating the selecting and calculating 
10 step for different subsets to obtain successive candidate 

camera solutions ; 

means for evaluating the candidate camera solutions; 

and 

means for selecting a best candidate camera 
15 solution. 



15. Apparatus as claimed in claim 14 wherein each subset 
comprises a selection of three model points and 
respective image points , the three model points defining 
2 0 apices of a first triangle and the three image points 

defining apices of a second triangle, and whereby the 
geometrical relationship is defined by the second 
triangle constituting a mapping of the first triangle 
onto the image plane. 



25 
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16. Apparatus as claimed in claim 15 wherein the mapping 
comprises a perspective mapping. 

17. Apparatus as claimed in any of claims 14 to 16 
5 wherein the evaluating means comprises means for mapping 

model points defined by the set of existing model data 
into the image plane using the candidate solution to 
obtain coordinates of reference points in the image 
plane; and 

10 means for correlating the reference points with the 

image points . 

18. Apparatus as claimed in claim 17, wherein the 
correlating means comprises means for determining whether 

15 each image point lies within a predetermined number of 

pixel units from a respective reference point and 
counting the number of such image points as a measure of 
correlation . 

20 19. Apparatus as claimed in claim 18, wherein the means 

for selecting the best candidate camera solution 
comprises means for selection according to the highest 
measure of correlation. 



2.5 



20. Apparatus as claimed in any of claims 14 to 19 




58 2635801 

comprising means for determining a set of inliers for the 
best candidate solution wherein each inlier comprises 
data defining co-ordinates of a model point together with 
co-ordinates of corresponding image points in each of at 
5 least three images of the sequence. 

21. Apparatus as claimed in any of claims 14 to 2 0 
wherein the solving means is operable to use the best 
camera solution to project the remainder of the set of 

10 existing model data into the image plane of the new 

virtual camera to obtain a set of further reference 
points in the image plane; 

to perform matching using the further reference 
points to identify matching further image points of the 

15 new image; and 

to add the co-ordinates of the further image points 
and respective model points to the set of co-ordinates 
determined at step (b) to thereby obtain an enlarged set 
of co-ordinates. 

20 

22. Apparatus as claimed in claim 20 wherein the solving 
means is operable to process the enlarged set of co- 
ordinates to obtain a revised result for the best camera 
solution. 



25 
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23. Apparatus as claimed in claim 22 further comprising 
means for generating further model data in accordance 
with the revised camera solution using image co-ordinates 
of matching features in the new image and preceding 
images of the sequence, and adding the further model data 
to the set of model data to form an expanded set of model 
data. 

24. Apparatus as claimed in claim 23 further comprising 
means for repeating the calculation of camera solutions 
for the sequence of images using the expanded set of 
model data . 

25. In a method of creating a 3-D model of an object by 
processing images taken from a series of respective 
camera positions relative to the object; 

the method comprising; 

processing an initial sequence of the images to 
define respective image co-ordinates of matching features 
to generate therefrom a set of model data defining model 
points in a 3-D space of the model and to obtain 
respective camera solutions representative of positions 
and orientations of virtual cameras in the 3-D space 
defining views of the model corresponding to the images; 

an improvement comprising: 
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adding a new image to the sequence and processing 
the new image to obtain a camera solution for a 
corresponding new virtual camera for use in generating 
further model data; 
5 wherein the processing of the new image comprises; 

(a) identifying a plurality of image points in the 
new image which are matched to a respective plurality of 
image points of at least one preceding image of the 
sequence for which respective 3-D model data defining 

10 corresponding model points exists; 

(b) determining a set of 2-D image co-ordinates of 
the identified image points in the new image and co- 
ordinates of respective model points; and 

(c) processing the set of 2-D image point co- 
15 ordinates and respective 3-D model point co-ordinates to 

obtain the camera solution for the new image using a 
solving process in which the position and orientation of 
an image plane representative of the new virtual camera 
are calculated from a geometrical relationship in the 3-D 
2 0 model space between model points and image points defined 

by the set of . co-ordinates . 



25 



26. In a apparatus for creating a 3-D model of an object 
by processing images taken from a series of respective 
camera positions relative to the object; 
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the apparatus comprising; 

means for processing an initial sequence of the 
images to define respective image co-ordinates of 
matching features to generate therefrom a set of model 
data defining model points in a 3-D space of the model 
and to obtain respective camera solutions representative 
of positions and orientations of virtual cameras in the 
3-D space defining views of the model corresponding to 
the images; 

an improvement comprising: 

means for adding a new image to the sequence and 
processing the new image to obtain a camera solution for 
a corresponding new virtual camera for use in generating 
further model data; 

wherein the means for processing of the new image 
comprises ; 

(a) means for identifying a plurality of image 
points in the new image which are matched to a respective 
plurality of image points of at least one preceding 
image of the sequence for which respective 3-D model data 
defining corresponding model points exists; 

(b) means for determining a set of 2-D image co- 
ordinates of the identified image points in the new image 
and co-ordinates of respective model points; and 

(c) means for processing the set of 2-D image point 
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co-ordinates and respective 3-D model point co-ordinates 
to obtain the camera solution for the new image using a 
solving process in which the position and orientation of 
an image plane representative of the new virtual camera 
5 are calculated from a geometrical relationship in the 3-D 

model space between model points and image points defined 
by the set of co-ordinates. 



27. In an apparatus for creating a 3-D model of an 
10 object by processing images taken from a series of 

respective camera positions relative to the object; 

processing an initial sequence of the images to 
define respective image co-ordinates of matching features 
to generate therefrom a set of model data defining model 
15 points in a 3-D space of the model and to obtain 

respective camera solutions representative of positions 
and orientations of virtual cameras in the 3-D space 
defining views of the model corresponding to the images; 
and 

2 0 adding a new image to the sequence and processing 

the new image to obtain a camera solution for a 
corresponding new virtual camera for use in generating 
further model data; 



25 



a method of performing the processing of the new 
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image comprising: 

(a) identifying a plurality of image points in the 
new image which are matched to a respective plurality of 
image points of at least one preceding image of the 
sequence for which respective 3-D model data defining 
corresponding model points exists; 

(b) determining a set of 2-D image co-ordinates of 
the identified image points in the new image and co- 
ordinates of respective model points; and 

(c) processing the set of 2-D image point co- 
ordinates and respective 3-D model point co-ordinates to 
obtain the camera solution for the new image using a 
solving process in which the position and orientation of 
an image plane representative of the new virtual camera 
are calculated from a geometrical relationship in the 3-D 
model space between model points and image points defined 
by the set of co-ordinates. 

28. A computer program comprising processor 
implementable instructions for carrying out a method as 
claimed in any one of claims 1 to 12 and 25. 

29. A storage medium storing processor implementable 
instructions for carrying out a method as claimed in any 
one of claims 1 to 12 and 25. 
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30. An electrical signal carrying processor 
implementable instructions for carrying out a method as 
claimed in any one of claims 1 to 12 and 25. 



ABSTRACT 
IMAGE PROCESSING APPARATUS 
A 3-D model of an object is created by processing 
images taken from a series of camera positions. An 
initial sequence of the images is processed to define 
respective image co-ordinates of matching features to 
generate a set of model data defining model points in a 
3-D space of the model and to obtain respective camera 
solutions representative of positions and orientations of 
virtual cameras in the 3-D space defining views of the 
model corresponding to the images, A new image is added 
to the sequence and processed to obtain a camera solution 
for a corresponding new virtual camera for use in 
generating further model data. Processing of the new 
image comprises; 

(a) identifying a plurality of image points in the 
new image which are matched to a respective plurality of 
image points of at least one preceding image of the 
sequence for which respective 3-D model data defining 
corresponding model points exists; 

(b) determining a set of 2-D image co-ordinates r of 
the identified image points in the new image and co- 
ordinates of respective model points; and 

(c) processing the set of 2-D image point co- 
ordinates and respective 3-D model point co-ordinates to 
obtain the camera solution for the new image using a 
solving process in which the position and orientation of 



an image plane representative of the new virtual camera 
are calculated from a geometrical relationship in the 3-D 
model space between model points and image points defined 
by the set of co-ordinates- 



1/1 1 



S " 2 

Ul qS o 
Qui p 

(£ H >- 

§ UJ O 
UJ CL £ 



A 





2: 




RA 


TIO 


LU 

— I 










OQ 


Q 


< 


O 


o 


_J 
< 






O 






THIS PAGE BLANK (uspto) 



Fig 2A 
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Fig 2B 
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Fig 3A 

ACTUAL CAMERA POSITIONS RELATIVE TO THE OBJECT 
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Fig 4 



PERFORMING MATCHING BETWEEN FEATURES OF NEW IMAGE 
AND PRECEDING IMAGE 
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Fig 5 
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MAPPING EXISTING MODEL POINTS INTO THE VIRTUAL IMAGE 
PLANE AT THE VIRTUAL CAMERA POSITION OF THE NEW IMAGE 
USING THE PROVISIONAL CAMERA SOLUTION (STEP 67 ) 
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FIG. 6A 



( ) 
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SELECT 2D TO 2D MODE AND PERFORM MATCHING BETWEEN I,, 



l 2 , l 3 
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62 



CALCULATE CAMERA SOLUTION FOR L,, L 2 , L 3 USING 2-D TO 2-D 

SOLVING PROCESS 



CALCULATE 3-D MODEL DATA 
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ADD NEW IMAGE l n TO SEQUENCE 



PERFORM MATCHING FROM USER INPUT BETWEEN IMAGE 
FEATURES IN AND IMAGE FEATURES IN l n 



65 



IDENTIFY 3-D MODEL CO-ORDINATES FOR MATCHED 2-D IMAGE 
CO-ORDINATES IN I TO OBTAIN A SET OF 3D-2D CO-ORDINATES 



CALCULATE PROVISIONAL RESULT FOR BEST CAMERA 
SOLUTION FOR Ln USING 3-D TO 2-D SOLVING PROCESS 
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~> USE PROVISIONAL RESULT FOR BEST CAMERA SOLUTION TO MAP 



n = n+1 



MODEL DATA INTO l n TO OBTAIN 2-D REFERENCE POINTS 
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PERFORM CONSTRAINED MATCHING IN l n TO MATCH THE 2-D 
REFERENCE POINTS WITH ADDITIONAL FEATURES OF I 
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IDENTIFY 3-D MODEL CO-ORDINATES AND FORM ENLARGED SET 
OF MATCHED 2-D IMAGE CO-ORDINATES IN I 



610 



CALCULATE REVISED RESULT FOR CAMERA SOLUTION FOR Ln 
USING 3-D TO 2-D SOLVING PROCESS WITH ENLARGED SET 



YES 




NO 
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FIG. 6B 

ADDITIONAL STEPS 



© 



612 



GENERATE ADDITIONAL MODEL DATA 
FROM MATCHED IMAGE FEATURE DATA 
ACCUMULATED IN CONCORDANCE 
TABLE 



RECALCULATE CAMERA SOLUTIONS 
i = 1 to n USING EXPANDED MODEL 
DATA SET AND 3-D TO 2-D SOLVING 
PROCESS 




PAGE BLANK (uspto) 



8/11 



FIG. 7 

PERFORMING MATCHING FROM USER INPUT BETWEEN IMAGE 
FEATURES IN l n , AND l n (STEP 64) 
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DISPLAY l n , AND l n INCLUDING INDICATORS IN I AT 
PREVIOUSLY MATCHING FEATURES 
CORRESPONDING TO EXISTING MODEL DATA 
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RECEIVE MATCHING SIGNALS FROM USER INPUT 




NO. 
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ADD CO-ORDINATES TO INITIAL SET FOR 
USE IN 3-D TO 2-D SOLVING 



NO 



74 



ADD MATCHING DATA 
TO CONCORDANCE 
TABLE OF MATCHED 
IMAGE FEATURES FOR 
GENERATING MODEL 
DATA 




i tuS PAGE BLANK (uspto) 



9/11 



Fig 8 



3-D TO 2-D SOLVING PROCESS (STEPS 66 AND 610) 

( ) 



SELECT AT RANDOM A SUBSET DEFINING APICES OF 
FIRST AND SECOND TRIANGLES IN MODEL SPACE 



CALCULATE A FIRST CANDIDATE SOLUTION USING 
THE GEOMETRICAL RELATIONSHIP BETWEEN THE 

TRIANGLES 
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USING FIRST CANDIDATE SOLUTION, MAP MATCHED 
MODEL DATA INTO IMAGE I 



1 
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CORRELATE WITH MATCHED POINT DATA TO OBTAIN 
THE NUMBER OF INLIERS FOR THE CANDIDATE 

SOLUTION 
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STORE CANDIDATE 
SOLUTION , 
INLIERS AND 
NUMBER OF 
INLIERS FOR THE 
BEST SOLUTION 



NO 



OUTPUT STORED RESULT AND INLIERS FOR BEST 
CANDIDATE SOLUTION 
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Fig 10 



IMAGE DATA 



MODEL DATA 



CONCORDANCE 
TABLE 



INLIERS FOR BEST 
CANDIDATE CAMERA 
SOLUTION 
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2D-2D SOLVING PROCESS 
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3D-2D SOLVING 
PROCESS 



RANSAC ALGORITHM 
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